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(57) A method and apparatus for suppressing 
acoustic noise in an acoustic signal (s(n)) represented 
by the frequency components of a plurality of frames 
each representing a small portion of the acoustic signal, 
comprising estimating the average magnitude of noise 
in each frequency component over a plurality of frames; 
estimating the variability of the magnitude of noise (110) 
in each frequency component; and generating denois- 
ing filter components in dependence on the estimated 
noise magnitudes, the estimated variability of the noise 



magnitude in each frequency component and the mag- 
nitude of each frequency component, and varying the 
magnitude of each frequency component (140) in de- 
pendence on the corresponding denoising filter compo- 
nent. 

This has the significant advantage of taking account 
of the variability of the magnitude of noise within each 
frequency component over time, making possible deter- 
mination of an approximate probability of any one fre- 
quency component being largely comprised of noise or 
alternatively of wanted speech signal. 
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Description 

Field of the Invention 

[0001] The present invention relates to a method and 5 
apparatus for processing an acoustic signal and in par- 
ticular for suppressing background acoustic noise from 
the acoustic signal. 

Background of the Invention 10 

[0002] Portable communication devices such as cel- 
lular telephones often need to detect and transmit a 
speech or similar signal in noisy environments such as 
a fast moving vehicle. Methods of suppressing back- 15 
ground acoustic noise have thus been developed which 
permit much better communication with such devices in 
noisy environments. 

[0003] The general approach adopted by some such 
methods is to represent the overall acoustic signal as 20 
the frequency components of a plurality of frames, each 
frame basically representing a small portion (e.g., about 
1 0 ms) of the acoustic signal, and then to attempt to de- 
tect and remove or suppress any noise components oc- 
curring within each frequency component of each frame. 25 
[0004] One simple and crude method estimates the 
average magnitude of noise |n(k)| in each frequency 
component s(k) over a large number of frames (e.g., of 
the order of about 1 00 frames) and then simply subtracts 
this from the magnitude of the frequency component js 30 
(k)| to generate modified or noise suppressed frequency 
component magnitudes |S(k)|. This method can be en- 
hanced by providing that the magnitudes |S(k)l of the 
modified frequency components are never allowed to 
fall below a minimum comfort noise floor level rj. 35 
[0005] A more sophisticated method, known as wien- 
er filtering, multiplies the magnitudes |s(k)| of the fre- 
quency components in each frame by a denoising filter 
having components G(k) such that |S(k)| = G(k).|s(k)| , 
where the G(k) are generated in respect of each frame 40 
according to the formula: - 



where 



45 



50 



4m 1 ) 

is the expected value of the square of the magnitude of 
the clean or denoised speech. Clearly, in a real system 
this must be estimated (e.g., by assuming that X(k) « S 
<k». 



[0006] In a third method, known as Minimum Mean 
Square Estimation (MMSE), the magnitudes |s(k)| of the 
frequency components s(k) are again multiplied by de- 
noising filter components G(k) such that |S(k)| = G(k).|s 
(k)|, but in this case, the G(k) are estimated using mod- 
ified Bessel functions (which must be sampled). This 
method is intensive in the amount of processing power 
which it requires (in terms of Millions of Instructions Per 
Second (MIPS)) which makes it unsuitable for portable 
communication devices where processing power is at a 
premium. 

[0007] Furthermore, none of the above methods 
takes any account of the variability of the noise compo- 
nents over time. 

Summary of the Invention 

[0008] According to a first aspect of the present inven- 
tion, there is provided a method of suppressing acoustic 
noise in an acoustic signal represented by the frequency 
components of a plurality of frames, each frame repre- 
senting a small portion of the acoustic signal, comprising 
the steps of estimating the average magnitude of noise 
in each frequency component over a plurality of frames, 
estimating the variability of the magnitude of noise in 
each frequency component; and generating denoising 
filter components in dependence on the estimated mag- 
nitude of noise in each frequency component, the esti- 
mated variability of the magnitude of noise in each fre- 
quency component and the magnitude of each frequen- 
cy component, and varying the magnitude of each fre- 
quency component in dependence on the correspond- 
ing denoising filter component. 

[0009] This method has the significant advantage of 
taking account of the variability of the magnitude of 
noise within each frequency component over time. In 
this way, it is possible to determine an approximate 
probability of any one frequency component being 
largely comprised of noise or alternatively of being 
largely comprised of wanted speech signal. 
[0010] Preferably, the method further comprises set- 
ting the filter components in dependence on the ratio of 
the magnitude of each frequency component to an es- 
timated likely maximum magnitude of noise for that fre- 
quency component, whereby, if the ratio exceeds a pre- 
determined amount for a given frequency component, 
the filter component corresponding to such a frequency 
component may be set to a maximum value which is 
preferably substantially equal to one, whereas, if the ra- 
tio is less than a second predetermined amount, the cor- 
responding filter component may be set to a minimum 
value, which is preferably substantially equal to 0.15. In 
one preferred embodiment, the filter components are 
varied in a linear dependence on the ratio of the magni- 
tude of each frequency component to the estimated like- 
ly maximum magnitude of noise for that frequency com- 
ponent between a minimum value of the filter compo- 
nents at or below the second predetermined amount 
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and a maximum value at or above the first predeter- 
mined value. 

[0011 J By having a maximum value of each filtering 
component of about 1 , it provides that for signals which 
are much larger than the maximum likely noise content, 5 
there is no signal attenuation. This is actually very ben- 
eficial for frequency components which are much larger 
than the likely maximum noise component, because 
since the phase of the noise component is not neces- 
sarily aligned to the phase of the speech (or similar) sig- 
nal, the noise component is almost as likely to destruc- 
tively interfere with the speech signal (thus reducing the 
overall magnitude of the frequency component relative 
to the clean speech equivalent) as to constructively in- 
terfere with it (thus increasing the magnitude of the fre- 
quency component relative to the clean speech equiv- 
alent). 

[0012] According to a further preferred embodiment 
of the present invention, prior to calculating the ratio of 
the magnitude of each frequency component to the es- 
timated likely maximum magnitude of noise for that fre- 
quency component, the magnitudes of the frequency 
components are filtered to remove high frequency fluc- 
tuations thereof. This filtering of the magnitudes of the 
frequency components is preferably achieved by gen- 
erating a short term mean estimation of the mean mag- 
nitudes of the frequency components, preferably over 
approximately three frames. 

[0013] According to a second aspect of the present 
invention, there is provided an apparatus for suppress- 
ing acoustic noise in an acoustic signal represented by 
the frequency components of a plurality of frames, each 
frame representing a small portion of the acoustic sig- 
nal, comprising: means for estimating the average mag- 
nitude of noise in each frequency component over a plu- 
rality of frames; means for estimating the variability of 
the magnitude of noise in each frequency component; 
means for generating denoising filter components in de- 
pendence on the estimated magnitude of noise in each 
frequency component, the estimated variability of the 
magnitude of noise in each frequency component and 
the magnitude of each frequency component, and 
means for varying the magnitude of each frequency 
component in dependence on the corresponding de- 
noising filter component. 

Brief Description of the Drawings 

[0014] In order that the present invention may be bet- 
ter understood, embodiments thereof will now be de- 
scribed, by way of example only, with reference to the 
accompanying drawings in which:- 

FIG. 1 is a block diagram of apparatus or method 
steps suitable for carrying out the present invention; 
and 



suppression apparatus or method steps in the ap- 
paratus or method of FIG. 1 . 

Detailed Description of the Invention 

[001 5] Referring firstly to FIG. 1 , there is shown a se- 
ries of steps or apparatus blocks showing the overall ap- 
proach of noise suppression according to the present 
invention. Considering FIG. 1 initially as representing a 
series of method steps, these are now described in de- 
tail below. 

[0016] The first step 10 is to take the acoustic signal 
s(n) (in the form of digital audio signal amplitude sam- 
ples) and to perform high pass filtering to remove low 
frequency components (which do not carry much 
speech signal information although they may contain a 
large amount of unwanted background acoustic noise). 
[0017] The second step 20 windows and overlaps (for 
example, by 50%) the high pass filtered acoustic signal. 
This step involves separating the signal into a series of 
overlapping segments and windowing them to form 
frames so that at the edge of each frame the amplitude 
of the signal is zero. 

[0018] The third step 30 performs the Fast Fourier 
Transform on each windowed vector. Given a 256 input 
signal vector s(n), we obtain a 256 vector s(k) where n 
and k stand respectively for some time, and frequency 
indices. In what follows we shall indicate spectral data 
with bold characters: n, s.... 

[0019] The fourth step 40 performs a transformation 
of the F FT outputs, from Cartesian to polar co-ordinates. 
[0020] The fifth step 50 uses the magnitude of the 
Fourier Transform, to evaluate the mean magnitude of 
spectral background noise mag(n(k)). 
[0021] The sixth step 60 performs the estimation of 
de-noised speech spectral magnitude mag(s(k)) using 
the noise evaluation from block 50, and the noisy 
speech spectral magnitude. 

[0022] The seventh, eighth and ninth steps (70,80,90) 
perform the symmetrical operations to those performed 
by respectively 30,20 and 10: conversion from polar to 
Cartesian, inverse Fourier transforms and overlap add. 
It is to be noted that the signal phases is not modified 
by the algorithm since the noisy speech phases is used 
to reconstruct the clean speech signal in step 70. 
The main structure of this algorithm is very classical. 
The innovative feature of the algorithm is in the way 
noise is removed from speech in step 60. This step is 
now described in detail. 

[0023] Referring now to FIG. 2, the step 60 can be 
subdivided into 3 sub-steps. 

[0024] The first sub-step 110 is dedicated to evaluat- 
ing the noise variance. Step 50 output is the mean mag- 
nitude of background noise. Thus on speechless frames 
input data can be used to evaluate the noise variance 
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FIG. 2 is a more detailed block diagram of noise 



o(k) = mean(mag(s(k)-n(k)))/mag(n(k)). 
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In fact the variance is obtained by low filtering the se- 
lected speech-free s(k) : 

a p (k)=5. a p " 1 (k)+(1-8) mag(s(k)-n(k))/mag(n(k)) 

where superscript p indicates the number of the speech- 
less frame. For a given frequency channel, a sample is 
considered to be only noise if mag (s(k))<4*mag(n(k)). 
[0025] The second sub-step 1 20 is dedicated to eval- 
uating the input signal short term mean M(k) (smoothed 
version of s(k)). It is obtained thanks to a one-tap IR fil- 
ter: 

M q (k) =y. M q ' 1 (k)+(1-y) mag(s(k)) 

where superscript q indicates the frame number. 
[0026] The third sub-step 130 is dedicated to calcu- 
lating the denoising filter gain for each frequency chan- 
nel. It is done as follows: 

If M(k)/n(k) < 1 then it is considered that there is 
only noise and the minimum gain factor K is applied, 
thus G(k)=K. 

If M(k)/n(k) > 1+pa(k) it is considered that noise is 
negligible and G(k) is set to 1 , which is the maximum 
gain factor (P is typically equal to 20). 
• In between we use a linear interpolation to calculate 
the gain factor: G(k)=K + (1 -K) (M(k)/n(k)-1 )/(pa(k)). 

[0027] The last operation of the algorithm consists in 
applying, at mixer 140, the gain to the noisy speech . 
spectral magnitude to obtain the estimation of the clean 
speech: mag(S(k))=G(k)*mag(s(k)). 



Claims 

1. A method of suppressing acoustic noise in an 
acoustic signal (s(k)) represented by the frequency 
components of a plurality of frames, each frame 
representing a small portion of the acoustic signal, 
comprising the steps of estimating the average 
magnitude of noise (50) in each frequency compo- 
nent over a plurality of frames, estimating the vari- 
ability of the magnitude of noise (110) in each fre- 
quency component; and generating de-noising filter 
components (60) in dependence on the estimated 
magnitude of noise in each frequency component, 
the estimated variability of the magnitude of noise 
in each frequency component and the magnitude of 
each frequency component, and varying the mag- 
nitude of each frequency component (140) in de- 
pendence on the corresponding de-noising filter 
component. 

2. The method according to claim 1 further comprising 



calculating the ratio of the magnitude of each fre- 
quency component to an estimated likely maximum 
magnitude of noise for that frequency component 
and setting the filter components in dependence on 

5 the calculated ratio for that frequency component, 

whereby, if the ratio exceeds a predetermined 
amount, which depends on the noise estimated var- 
iability of the magnitude, for a given frequency com- 
ponent, the filter component corresponding to such 

10 a frequency component may be set to a maximum 
value, whereas, if the ratio is less than a second 
predetermined amount, the corresponding filter 
component may be set to a minimum value. 

15 3. The method according to claim 2 wherein the filter 
components are varied in a linear dependence on 
the ratio of the magnitude of each frequency com- 
ponent to the estimated likely maximum magnitude 
of noise for that frequency component between a 

20 minimum value of the filter components at or below 
the second predetermined amount and a maximum 
value at or above the first predetermined value. 

4. The method according to claim 2 or 3 wherein the 
25 first predetermined value is in a linear dependence 

to the estimated variability of the noise variability. 

5. The method according to claim 2 or 3 wherein the 
minimum value is substantially equal to 0.15. 

30 

6. The method according to claim 4 wherein the vari- 
ability of the noise magnitude for a given frequency 
component is estimated by filtering on speechless 
frames, with a one tap IR filter, the distance between 

35 the noise estimated magnitude and the noisy 
speech magnitude, divided by the noise estimated 
magnitude. 

7. The method according to claim 2 wherein prior to 
40 calculating the ratio of the magnitude of each fre- 
quency component to the estimated likely maximum 
magnitude of noise for that frequency component, 
the magnitudes of the frequency components are 
filtered to remove high frequency fluctuations there- 

45 of. 

8. The method according to claim 6 wherein the filter- 
ing of the magnitudes of the frequency components 
is achieved by generating a short term mean esti- 

50 mation of the mean magnitudes of the frequency 
components. 

9. An apparatus for suppressing acoustic noise in an 
acoustic signal (s(k)) represented by the frequency 

55 components of a plurality of frames, each frame 
representing a small portion of the acoustic signal, 
comprising: 
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means for estimating the average magnitude of 
noise in each frequency component over a plu- 
rality of frames (50); 

means for estimating the variability of the mag- 
nitude of noise in each frequency component 5 
(110); 

means for generating de-noising filter compo- 
nents (60) in dependence on the estimated 
magnitude of noise in each frequency compo- 
nent, the estimated variability of the magnitude 10 
of noise in each frequency component and 
the magnitude of each frequency component, 
and 

means for varying the magnitude of each fre- 
quency component (1 40) in dependence on the * 5 
corresponding de-noising filter component. 
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